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Globin Family 
Figure 6 
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Generate a sequence of symbols by 
designating each amino acid within a sequence of amino acids 
with a symbol, where an amino acid is designated a first symbol 
if it is a member of a predetermined set of amino acids, and a 
second symbol different from the first symbol if the amino 
acid is not a member of the set 



Determine which signals of the symbols are present in the 
sequence of symbols, where a signal is a window of the sequence 
of symbols consisting of a predefined number of contiguous symbols 
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Transform each amino acid within a collection of protein sequences into 
symbols, where an amino acid is designated a first symbol if it is a member 
of a first test set and a second symbol different from the first symbol if the 
amino acid is not a member of the first test set, 
to produce a collection of sequences of symbols 
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determine the number of occurrences of different signals of 
the symbols in the collection of sequences of symbols, where a 
signal is a window of a sequence of symbols consisting of a 
predefined number of contiguous symbols 
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determine the probability that the distribution of the number of 
signals of each signal strength occurs by chance, where the lower 
the probability the more useful the test set of amino acids is 

for protein analysis 
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designate each amino acid within a family of protein sequences 
with a symbol, wherein an amino acid is designated a first symbol 
if it is a member of a predetermined set of amino acids, and a 
second symbol different from the first symbol if the amino 
acid is not a member of the predetermined set, thereby 
producing a plurality of sequences of symbols 



determine which signals of the symbols are present in the sequences 
of symbols, wherein a signal is a window of the sequence of symbols 
consisting of a predefined number of contiguous symbols 
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determine a conserved signal pattern between members of the family 
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analyze a query protein to identify a signal pattern 



determine the level of similarity between the query protein's signal 
pattern and the conserved signal pattern of the family 
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designate the query as having the fold of the family if 
the signal pattern of the query exceeds a threshold level 
of similarity with the conserved signal pattern of the family 
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Calculate expected signal 
pattern distribution (SPD) 
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Transform amino acid 
sequence(s) into symbols 



Identify significant signals 
in each translation 
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